feat: optimize CASE WHEN for divide-by-zero protection pattern#19994
feat: optimize CASE WHEN for divide-by-zero protection pattern#19994CuteChuanChuan wants to merge 6 commits intoapache:mainfrom
Conversation
267d8df to
334785f
Compare
|
run benchmark tpcds |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Hi @andygrove , |
|
run benchmark tpcds |
|
Also FYI @pepijnve |
|
🤖 |
|
It might be useful to add a microbenchmark to https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/benches/case_when.rs so we can compare before/after. I don't think any of the existing ones will cover this particular pattern. The TCP-DS queries that contain this pattern (based on |
|
🤖: Benchmark completed Details
|
- Adds a specialization for the common pattern: CASE WHEN y > 0 THEN x / y ELSE NULL END - Add EvalMethod::DivideByZeroProtection variant - Add pattern detection in find_best_eval_method() - Implement safe_divide using Arrow kernels - Handle CastExpr wrapping on divisor
334785f to
6479884
Compare
|
Microbenchmark result:
|
|
Hi @pepijnve , thanks for pointing out the need for a microbenchmark. I compared |
| } | ||
| } | ||
|
|
||
| fn benchmark_divide_by_zero_protection(c: &mut Criterion, batch_size: usize) { |
There was a problem hiding this comment.
Could you please pull this additional benchmark code code into its own PR we could then use our benchmarking scripts to compare performance of this PR with main?
Thank you 🙏
| }, | ||
| ); | ||
|
|
||
| group.bench_function( |
There was a problem hiding this comment.
I'm trying to spot the difference between this benchmark and the one just above it, they look identical to me. Yet the measurements seem to produce similar values. What causes that?
There was a problem hiding this comment.
Hi @pepijnve ,
The key difference is in which column is used in the WHEN condition:
- First benchmark (
DivideByZeroProtection): checks divisor_col > 0, and since divisor_col is also the divisor in numerator / divisor_col, this matches the pattern and triggers the optimization. - Second benchmark (
ExpressionOrExpression): checks divisor_copy_col > 0, but the division uses divisor_col. Since the checked column doesn't match the divisor, the optimization is not triggered and it falls back to ExpressionOrExpression.
I'll also add a comment in the code to make this distinction clearer.
There was a problem hiding this comment.
Ah of course. That's really subtle. A comment for future readers is indeed a good idea.
There was a problem hiding this comment.
A followup question I had, but I'm happy to measure that myself, is what's causing the performance gap? Might be interesting to see if there's something we could do to improve ExprOrExpr for this particular pattern.
There was a problem hiding this comment.
Hi @pepijnve,
I think the performance gap might come from execution model differences. Please correct me if I am wrong.
ExprOrExpr: evaluate condition → build selection → execute on selected rows → merge
DivideByZeroProtection: fully vectorized Arrow kernels
The vectorized path avoids selection/branching overhead, which is why it wins when most rows need computation (low zero density). However, ExprOrExpr's short-circuit helps when many rows are filtered (high zero density).
Although I don't have concrete ideas about potential ExprOrExpr improvements for this pattern yet, I'd be happy to explore further if you have any suggestions!
There was a problem hiding this comment.
I've been studying the results a bit. I'll add some extra comments on the relevant bits of code.
There was a problem hiding this comment.
There is another PR here from @pepijnve to improve some cases
|
Benchmark was added in #20076 |
## Which issue does this PR close? Related to #19994 - This PR extracts the benchmark code to allow performance comparison. ## Rationale for this change As pointed out by @alamb in #19994, this separates the microbenchmark code so that the benchmarking scripts can compare the optimization PR against main with the benchmark already in place. ## What changes are included in this PR? Adds a microbenchmark for the divide-by-zero protection pattern in `case_when.rs`: - Benchmarks with varying percentages of zeros (0%, 10%, 50%, 90%) - Compares `DivideByZeroProtection` pattern (where checked column matches divisor) vs `ExpressionOrExpression` fallback (where they don't match) ## Are these changes tested? benchmark code only. ## Are there any user-facing changes? No.
|
I merged up to get the new benchmarks so I could run them on our runner |
|
run benchmark case_when |
This comment was marked as outdated.
This comment was marked as outdated.
|
run benchmark case_when |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
@pepijnve |
|
run benchmark case_when |
|
🤖 |
|
running with updated benchmarks |
|
run benchmark case_when |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Divide by zero protection is looking pretty good: For some reason the CI is now failing on this PR When #20097 is merged, then we can run the benchmarks again and see how much additional boost this special case gives us |
That's due to #19994 (comment) |
…0097) ## Which issue does this PR close? - Related to #11570 ## Rationale for this change While reviewing #19994 it became clear the optimised `ExpressionOrExpression` code path was not being used when the case expression has no `else` branch or has `else null`. In those situations the general evaluation strategies could end up being used. This PR refines the `ExpressionOrExpression` implementation to also handle `else null` expressions. ## What changes are included in this PR? Use `ExpressionOrExpression` for expressions of the form `CASE WHEN x THEN y [ELSE NULL]` ## Are these changes tested? Covered by existing SLTs ## Are there any user-facing changes? No
4319634 to
0a8f085
Compare
|
run benchmark case_when |
This comment was marked as outdated.
This comment was marked as outdated.
|
🤖 |
|
🤖: Benchmark completed Details
|
|
It seems that effect of Divide by zero protection is not significant.🤣 |
Those last results are in line with what I was seeing locally. You get a speedup when there is a small percentage of zeroes, but a slow down when there are many. Not sure what the wisest choice is for this one. Personally I think I’m inclined to keep focussing on squeezing more out of the general pattern code paths than adding very specialized ones. Perhaps with some more profiling and optimisation work we can shave some more off the general 10% variant for instance? I haven’t looked at that closely yet. |
Closing this PR and pursuing more general optimization seems more reasonable. Thanks @alamb and @pepijnve for very informative feedback and discussion.❤️ |
|
@CuteChuanChuan just wanted to second the positive sentiment expressed by @alamb. Taking a closer look at this with the profiler showed that my assumptions wrt which code path was being taken were wrong and that there was still some low hanging fruit left in the general code paths. I've left some pointers in #11570 (comment) regarding possible future work. We might be able to squeeze some better performance out of |
|
Thanks again @pepijnve and @CuteChuanChuan |
I'd like to give it a try. |
Which issue does this PR close?
Rationale for this change
The
CaseExprimplementation is expensive. A common usage pattern (particularly in TPC-DS benchmarks) is to protect against divide-by-zero:This entire expression can be replaced with a simpler divide operation that returns NULL when the divisor is zero, avoiding the overhead of full CASE evaluation.
What changes are included in this PR?
Are these changes tested?
Yes, added two new tests:
Are there any user-facing changes?
No. This is an internal optimization that produces the same results but with better performance.